3 research outputs found

    Transactional WaveCache: Towards Speculative and Out-of-Order DataFlow Execution of Memory Operations

    Full text link
    The WaveScalar is the first DataFlow Architecture that can efficiently provide the sequential memory semantics required by imperative languages. This work presents an alternative memory ordering mechanism for this architecture, the Transaction WaveCache. Our mechanism maintains the execution order of memory operations within blocks of code, called Waves, but adds the ability to speculatively execute, out-of-order, operations from different waves. This ordering mechanism is inspired by progress in supporting Transactional Memories. Waves are considered as atomic regions and executed as nested transactions. If a wave has finished the execution of all its memory operations, as soon as the previous waves are committed, it can be committed. If a hazard is detected in a speculative Wave, all the following Waves (children) are aborted and re-executed. We evaluate the WaveCache on a set artificial benchmarks. If the benchmark does not access memory often, we could achieve speedups of around 90%. Speedups of 33.1% and 24% were observed on more memory intensive applications, and slowdowns up to 16% arise if memory bandwidth is a bottleneck. For an application full of WAW, WAR and RAW hazards, a speedup of 139.7% was verified.Comment: Submitted to ACM International Conference on Computing Frontiers 2008, http://www.computingfrontiers.org/, 20 page

    Couillard: Parallel Programming via Coarse-Grained Data-Flow Compilation

    Full text link
    Data-flow is a natural approach to parallelism. However, describing dependencies and control between fine-grained data-flow tasks can be complex and present unwanted overheads. TALM (TALM is an Architecture and Language for Multi-threading) introduces a user-defined coarse-grained parallel data-flow model, where programmers identify code blocks, called super-instructions, to be run in parallel and connect them in a data-flow graph. TALM has been implemented as a hybrid Von Neumann/data-flow execution system: the \emph{Trebuchet}. We have observed that TALM's usefulness largely depends on how programmers specify and connect super-instructions. Thus, we present \emph{Couillard}, a full compiler that creates, based on an annotated C-program, a data-flow graph and C-code corresponding to each super-instruction. We show that our toolchain allows one to benefit from data-flow execution and explore sophisticated parallel programming techniques, with small effort. To evaluate our system we have executed a set of real applications on a large multi-core machine. Comparison with popular parallel programming methods shows competitive speedups, while providing an easier parallel programing approach.Comment: 10 pages, 5 figure

    Exploring the Equivalence between Dynamic Dataflow Model and Gamma - General Abstract Model for Multiset mAnipulation

    Full text link
    With the increase of the search for computational models where the expression of parallelism occurs naturally, some paradigms arise as options for the next generation of computers. In this context, dynamic Dataflow and Gamma - General Abstract Model for Multiset mAnipulation) - emerge as interesting computational models choices. In the dynamic Dataflow model, operations are performed as soon as their associated operators are available, without rely on a Program Counter to dictate the execution order of instructions. The Gamma paradigm is based on a parallel multiset rewriting scheme. It provides a non-deterministic execution model inspired by an abstract chemical machine metaphor, where operations are formulated as reactions that occur freely among matching elements belonging to the multiset. In this work, equivalence relations between the dynamic Dataflow and Gamma paradigms are exposed and explored, while methods to convert from Dataflow to Gamma paradigm and vice versa are provided. It is shown that vertices and edges of a dynamic Dataflow graph can correspond, respectively, to reactions and multiset elements in the Gamma paradigm. Implementation aspects of execution environments that could be mutually beneficial to both models are also discussed. This work provides the scientific community with the possibility of taking profit of both parallel programming models, contributing with a versatility component to researchers and developers. Finally, it is important to state that, to the best of our knowledge, the similarity relations between both dynamic Dataflow and Gamma models presented here have not been reported in any previous work.Comment: Study submitted to the IPDPS 2019 - IEEE International Parallel and Distributed Processing Symposiu
    corecore